- As a SRE this role will be responsible for monitoring the applications and responding to events, incidents and changes originating from internal or vendor applications.
- Investigate incidents and problems and determine root cause.
- Analyze existing IT processes and use IaC to automate them.
- Reports to the Director of IT and works to establish operational metrics for our AWS and Azure environments.
- The SRE role will participate in our on-call rotation.
- Job Responsibilities
Work on platform services to design, develop, and improve services, platforms and processes that result in improved end-to-end reliability and maintainability for all our services.
Create and drive adoption of tools that help deliver insights and automation to simplify the complex world of large scale services.
Create the infrastructure to support the deployment of Supply Chain in AWS.
Leverage new technology paradigms (e.g., serverless, containers, microservices)
Influence infrastructure architecture by sharing your application development expertise.
Be a mentor for design reviews, code, and test cases.
Quickly adapt, apply and train on new technologies, tools, methods, and processes from both internal and external sources.
Familiar with ITSM methodology for incident response.
Basic Qualifications
Bachelor s Degree in Computer Science or 5+ years professional experience in software development (MS, BE, Computer Science, Site reliability, etc..)
5+ years of large-scale software development or application engineering with recent coding experience in one or more of the following languages: Java, JavaScript, C/C++, C#, Node.js, Python, or Rust.
Experience in designing and building infrastructure to support applications using container and serverless technologies.
Experience in designing and building infrastructure to support traditional 3-tier applications.
Experience with network technologies such as static routing, BGP, firewalls, WAFs, and DDoS services.
Proficiency in scripting languages such as Bash, Python, and PowerShell
Experience working with operating systems (Linux, Windows).
Experience supporting infrastructure for large multi-services applications.
Experience working with CICD in micro-services architectures.
Experience with observability/Monitoring tools: DataDog, New Relic, Istio.
Experience working with configuration management tools: Kubernetes.
Experience developing environment documentation and support procedures.
Preferred Qualifications
Understanding of enterprise IT operational capabilities examples include Change, Release, Incident Management, infrastructure management or applications management.
Experience architecting highly available systems that utilize load balancing, horizontal scalability and high availability
Experience with Agile software development and DevOps practices such as Infrastructure as Code (IaC), Continuous Integration and automated deployment
Experience in adopting chaos engineering techniques to validate system resiliency
Experience with Distributed Services, Asynchronous Messaging Architecture, Eventual Consistency, Telemetry, and high scale experience with managing and writing services on top of cloud environments such as Azure, AWS, or Google Cloud Platform
Company
Judit Inc
United States of America
Location
Remote Position
(From Everywhere/No Office Location)
Job type
Full-Time
Rust Job Details
Title: Site Realiability Engineer(SRE)
Full-time/Permanent
JOB DESCRIPTION AND RESPONSIBILITIES:
REQUIREMENTS:
Hanuman| d:
JUDiT Inc.com
Judit is a Certified Woman-Owned Business by the NWBOC
Judit is a Women-Owned Business Enterprise certified by the NYC Department of Small Business Services
NY HQ: One Old Country Rd Ste 384 | Carle Place, NY 11514
FL HQ: 7000 Palmetto Park Rd Ste 302 | Boca Raton, FL 33433
Job Skills
More Developer Job Boards
Fullstack Developer Jobs Golang Jobs JavaScript Jobs Python Jobs React Jobs Rust Jobs Java Jobs